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Abstract — Smartphones and mobile devices are rapidly becom- 
ing indispensable devices for many users. Unfortunately, they 
also become fertile grounds for hackers to deploy malware and 
to spread virus. There is an urgent need to have a '■'■security 
analytic & forensic system" which can facilitate analysts to 
examine, dissect, associate and correlate large number of mobile 
applications. An effective analytic system needs to address the 
following questions: How to automatically collect and manage a 
high volume of mobile malware? How to analyze a zero-day suspi- 
cious application, and compare or associate it with existing malware 
families in the database? How to perform information retrieval so 
to reveal similar malicious logic with existing malware, and to 
quickly identify the new malicious code segment? In this paper, 
we present the design and implementation of DroidAnalytics, a 
signature based analytic system to automatically collect, manage, 
analyze and extract android malware. The system facilitates 
analysts to retrieve, associate and reveal malicious logics at the 
"opcode level". We demonstrate the efficacy of DroidAnalytics 
using 150,368 Android applications, and successfully determine 
2,494 Android malware from 102 different families, with 342 of 
them being zero-day malware samples from six different families. 
To the best of our knowledge, this is the first reported case in 
showing such a large Android malware analysis/detection. The 
evaluation shows the DroidAnalytics is a valuable tool and is 
effective in analyzing malware repackaging and mutations. 

I. Introduction 

Smartphones are becoming prevailing devices for many peo- 
ple. Unfortunately, malware on smartphones is also increasing 
at an unprecedented rate. Android OS-based systems, being 
the most popular platform for mobile devices, have been 
a popular target for malware developers. As stated in [1], 
the exponential growth of mobile malware is mainly due to 
the ease of generating malware variants. Although there are 
number of works which focus on Android malware detection 
via permission leakage, it is equally important to design a 
system that can perform comprehensive malware analytics: 
analyze and dissect suspicious applications at the opcode 
level (instead at the permission level), and correlate these 
applications to existing malware in the database to determine 
whether they are mutated malware or even zero-day malware, 
and to discover which legitimate applications are infected. 
Challenges: To realize an effective analytic system for 
Android mobile applications, we need to overcome several 
technical hurdles. First, how to systematically collect malware 
from the wild. As indicated in [|21, new malware variants 



are always hidden in many different third-party markets. Due 
to the competition of anti-virus companies and their fear 
of accidentally releasing malware to the public, companies 
are usually reluctant to share their malware database to re- 
searchers. Researchers in academic can only obtain a small 
number of mobile malware samples. Hence, how to automate 
a systematic process to obtain these malicious applications is 
the first hurdle we need to overcome. 

The second hurdle is how to identify repackaged applica- 
tions (or mutated malware) from the vast ocean of applications 
and malware. As reported in (|3], hackers can easily transform 
legitimate applications by injecting malicious logic or obfus- 
cated program segments so that they have the same structure 
as the original application but contain malicious logic. Thus, 
how to determine whether an application is a repackaged or 
obfuscated malware, and which legitimate applications are 
infected is very challenging. 

The third hurdle is how to associate malware with existing 
malware (or application) so as to facilitate security analysis. 
The existing approach of using cryptographic hash or package 
name as an identifier is not effective because hackers can easily 
change the hash value or package name. Currently, security 
analysts need to go through a laborous process of manually 
reverse engineer a malware to discover malicious functions and 
structure. There is an urgent need to have an efficient method 
to associate malware with other malware in the database, so 
to examine their commonalities at the opcode level. 
Contributions: To address these problems mentioned above, 
we present the design and implementation of DroidAnalytics, 
an Andorid malware analytic system for malware collection, 
signature generation, information retrieval, and malware asso- 
ciation based on similarity score. Furthermore, DroidAnalytic 
can efficiently detect zero-day repackaged malware. The con- 
tributions of our system are: 

• DroidAnalytics automates the processes of malware col- 
lection, analysis and management. We have successfully 
collected 150,368 Android applications, and determined 
2,494 malware samples from 102 families. Among those, 
there are 342 zero-day malware samples from six dif- 
ferent malware families. We also plan to release the 
malware database to the research community (please refer 
to |https://dl.dropbox.com/u/37123887/malware.pdf/ . 



• Droid Analytics uses a multi-level signature algorithm 
to extract the malware feature based on their semantic 
meaning at the opcode level. This is far more robust than 
a cryptographic hash of the entire appUcation. We show 
how to use DroidAnalytics to combat against malware 
which uses repackaging or code obfuscation, as well as 
how to analyze malware with dynamic payloads (see 

Sec. Unli. 

• Unlike previous works which associate malware via ''per- 
mission", DroidAnalytics associates malware and gen- 
erates signatures at the app/class/method level. Hence, 
we can easily track and analyze mutation, derivatives, 
and generation of new malware. DroidAnalytics can 
reveal malicious behavior at the method level so to iden- 
tify repackaged malware, and perform class association 
among malware/applications (see Sec. lIVI i. 

• We show how to use DroidAnalytics to detect zero- 
day repackaged malware. We have found 342 zero-day 
repackaged malware in six different families (see Sec.lVt. 

II. Design & Implementation of DroidAnalytics 

Here, we present the design and implementation of Droid- 
Analytics. Our system consists of modules for automatic 
malware collection, signature generation, information retrieval 
and association, as well as similarity comparison between 
malware. We will also show how to use these functions to 
detect zero-day repackaged malware. 




Fig. 1. The Architecture of the DroidAnalytics 

A. Building Blocks of DroidAnalytics 

Figure [T] depicts the architecture of DroidAnalytics and its 
components. Let us explain the design of each component. 

• Extensible Crawler: In DroidAnalytics, we implement an 
application crawler based on Scrapy |4|. Users can specify 
official or third party market places, as well as blog sites and 
the crawler will perform regular mobile application download. 
The crawler enables us to systematically build up the mobile 
applications database for malware analysis and association. So 
far, we have collected 150,368 mobile applications and carried 
out detailed security analysis. 

• Dynamic Payloads Detector: To deal with malware which 
dynamically downloads malicious codes via the Internet or 



attachment files, we have implemented the dynamic payloads 
detector component, which determines malicious trigger code 
within malware packages and tracks the downloaded appli- 
cation and its behavior in virtual machine. Firstly, it scans 
the package to find suspicious files such as .elf or .jar 
file. Hackers usually camouflage malicious files by changing 
their file type. To overcome this, this component scans all files 
and identifies files using their magic numbers instead of file 
extension. Secondly, if an application has any Internet behavior 
(e.g., Internet permission or re-delegating other applications 
to download files fSl), the dynamic payloads detector will 
treat these files as the target, then runs the application in 
the emulator. The system will use the forward symbolic 
execution technique to trigger the download behavior [6|. 
Both the suspicious files within the package and dynamically 
downloaded files from the Internet will be sent to the signature 
generator (which we will shortly describe) for further analysis. 

• Android App Information (AIS) Parser: AIS is a data 
structure within DroidAnalytics and it is used to represent 
. apk information structure. Using the AIS parser, analysts 
can reveal the cryptographic hash (or other basic signature) 
of an . apk file, its package name, permission information, 
broadcast receiver information and disassembled code, . . ., 
etc. Our AIS parser decrypts the AndroidManif est . xml 
within an application and disassembles the . dex file into 
.small code. Then it extracts package information from 
source code and retains it in AIS so analysts can easily retrieve 
this information. 

• Signature Generator: Anti-virus companies usually use 
cryptographic hash, e.g., MD5, to generate a signature for an 
application. This has two major drawbacks. Firstly, hackers 
can easily mutate an application and change its cryptographic 
hash. Secondly, the cryptographic hash does not provide 
sufficient flexibility for security analysis. In DroidAnalytics, 
we use a three-level signature generation scheme to identify 
each application. This signature scheme is based on the mobile 
application, classes, methods, as well as malware's dynamic 
payloads (if any). Our signature generation is based on the 
following observation: For any functional application, it needs 
to invoke various Android API calls, and Android API calls 
sequence within a method is difficult to modify (unless one 
drastically changes the program's logic, but we did not find 
any from the 150,368 applications we collected that used 
this obfuscation technique). Hence, we generate a method's 
signature using the API call sequence, and given the signature 
of a method, create the signature of a class which composes 
of different methods. Finally, the signature of an application is 
composed of all signatures of its classes. We like to emphasize 
that our signature algorithm is not only for defense against 
malware obfuscation, but more importantly, facilitating mal- 
ware analysis via class/method association (we will show in 
later sections). Let us present the detail of signature generation, 
(a) Android API calls table: Our system uses the API calls 
table of the Android SDK. The android, jar file is the 
framework package provided by the Android SDK. We use the 
Java reflection [7| to obtain all descriptions of the API calls. 



For each API, we extract both the class path and the method 
name. We assign each full path method a hex number as part of 
the ID. For the current version of DroidAnalytics, we extract 
47,126 full path methods in the Android SDK 4.1 version as 
our API calls table. Table I] depicts a snapshot of API calls 
table, e.g., android/content/intent; -><init> is 
assigned an ID 0x30291. 



Full Path Method 


Method ID 


android/account s /Account ; -><init> 

android/ content /Intent ; -><init> 
android/content/ Intent ; ->toUri 
android/telephony /SmsManager; ->get Default 
android/ app/Pendinglntent; ->get Broadcast 


0x00001 

0x30291 
0x30292 
0x39D53 
0xF3E91 



TABLE I 

Example of the Android API Calls Table and assigned IDs 



const/4 v2, 0x0 

new-instance v3, Landroid/content/lntent; 
const-string v5, "SENT" 

invoke-direct {v3, v5}, Landroid/content/lntent;-><init>(Ljava/lang/String;)V 

const/4 v5, 0x0 

invoke-static {v1, v2, v3, v5}, Landroid/app/Pendinglntent;-> getBroadcast\ 
(Landroid/content/Context;ILandroid/content/lntent;l} Landroid/app/Pendingintent; 

move-result-object v4 

.local v4, pi:Landroid/app/Pendinglntent; 

invoke-static {}, Landroid/telephony/SmsManager;-> getDefault()\ 
Landroid/telephony/SmsManager; 

move-result-object vO 



android/content/lntent;-><init>() 

android/app/Pending[ntent;->getBroadcast() 

android/telephony/SmsManager;->getDefault() 


API Calls Table 


-> 0x30291 
-> 0xF3E91 
-> 0x39053 











1 * 

0x30291 0xF3E91 0x39053 




Method Sign: AAAE1 





Fig. 2. The Process of Lev3 Signature Generation 



(b) Disassembling process: Each Android application is 
composed of different classes and each class is composed of 
different methods. To generate signatures for each class or 
method, DroidAnalytics first disassembles an . apk file, then 
takes the Dalvik opcodes of the . dex file and transforms them 
to methods and classes. Then DroidAnalytics uses the Android 
API calls table to generate signatures. 

(c) Generate Lev3 signature (or method signature): The 
system first generates a signature for each method and we 
call this the Lev3 signature. Based on the Android API calls 
table, the system extracts the API call ID sequence as a string 
in each method, then hashes this string value to produce the 
method's signature. Figure |2] illustrates how to generate the 
Lev3 signature of a method which sends messages to another 
mobile phone. Figure |2| shows that the method contains three 
API calls. Using the Android API calls table (as in Table |I|, 
we determine their IDs. Signature of a method is generated by 
cancatenation of all these IDs. Note that DroidAnalytics will 
not extract the API calls which will not be executed in run time 
because these codes are usually generated via obfuscation. 
Fuithermore, if a method (except the main method) will not be 
invoked by any other methods, signature generator will also 
ignore this method because this may be a defunct method 
generated by malware writers. 

(d) Generate Lev2 signature (either class signature or 
dynamic payload signature): Next, DroidAnalytics proceeds 
to generate the Lev2 signature for each class, and it is based 
on the Lev3 signatures of methods within that class. Malware 
writers may use various obfuscation or repackaging techniques 
to change the calling order of the methods table in a . dex file. 
To overcome this problem, our signature generation algorithm 
will first sort the level 3 signatures within that class, and then 
concatenate all these level 3 signatures to form the level 2 
signature. 

Some malicious codes are dynamically downloaded from 
the Internet during execution. DroidAnalytics uses the dynamic 
payloads detector component to obtain the payload files. For 



the dynamic payloads which are .dex file or .jar file, 
DroidAnalytics treats them as classes within the malware. 
Given these files, the system checks their API call sequence 
and generates a Lev2 signature for each class within an 
application. For the dynamic payloads which contain, say, 
.elf file or . s o file, DroidAnalytics treats them as a single 
class within that malware, then uses the cryptographic hash 
value (e.g., MD5) of the payload as its Lev2 signature. For 
the dynamic payloads which are .apk files, DroidAnalytics 
treats each as a new application and a class within the 
malware. DroidAnalytics first uses the cryptographic hash 
value (e.g., MD5) of the new . apk file as one Lev2 signature 
of that malware. Because the payload is a new application, 
DroidAnalytics will use the method we discussed to carry out 
a new signature generation. 

(e) Generate Levi signature (or application signature): 

The Levi signature is based on the level 2 signatures, e.g., 
signatures of all qualified classes within an application. In 
addition, the signature generator will ignore the class (except 
the main class) which will not be invoked by any other classes 
since these defunct classes may be generated via obfuscation. 
Malware writers may use some repackaging or obfuscation 
techniques to change the order of the classes table of the . dex 
file, our signature algorithm will first sort all Lev2 signatures, 
then concatenate these Lev2 signatures to generate the Levi 
signature. 

Figure [3] summarizes the framework of our signature algo- 
rithm. For example, the Lev3 signatures of AAAEl and B2 3E8 
are the two method signatures within the same class. Based on 
these two (sorted) signatures, we generate the Lev2 signature 
of the corresponding class, which is 53EB3. Note that the 
Lev2 signature of C3EB3 is generated from a . dex file which 
is a dynamic payload used to execute the malicious behavior 
Based on all sorted Lev2 signatures of all classes, we generate 
the Levi signature, F32DE, of the application. 

For the current DroidAnalytics platform, we use a server 
which is of 2.80 GHz Duo CPU processor, 4GB memory and 




lods, classes 
signatures 



lamic payload 
signatures 



Fig. 3. Illustration of signature generation: the application (Levi) signature, 
class level (Lev2) signatures and method level (Lev3) signatures. 

2 TB hard disk, with two virtual machines in the server to 
implement the anti-virus engine. We carried out experiment 
to study the processing time to scanning and generating 
signatures. On average, it takes around 60 seconds to scan one 
application (includes the dynamic analysis), and around three 
seconds to generate all three level signatures, five seconds to 
generate AIS information, and one second to insert information 
into the database. As of November 2012, our system have 
downloaded 150,368 mobile applications from the following 
places: Google Play [SJ, nine Android third party markets 
(e.g., m-HH), two malware forums lfT2l . lfT3l and one mobile 
malware share blog [jSJ. The size of all downloaded application 
is 468GB. 

III. Utility & Effectiveness of Signature Based System 

Here, we illustrate how DroidAnalytics' signatures can 
be used to analyze (and detect) malware repackaging, code 
obfuscation and malware with dynamic payloads. 
A. Analyzing Malware Repackaging Repackaging obfus- 
cation technique is commonly used by malware writers to 
change the cryptographic hash value of an . apk file without 
modifying the opcodes of the . dex file. This technique is 
different from the repackaged technique which is to inject 
new packages into the legitimate apphcations. For example, 
using Jarsigner utility of Java SDK to re-sign an . apk 
file only changes the signature part of one . apk file, and 
generates a new . apk file which preserves the same logic 
and functionality as the original one. Another example is 
using the Apktool lfT4ll . which is a reverse engineering tool 
to disassemble and rebuild an .apk file without changing 
any assembly code. Although there is no modification on the 
assembly codes, the recompiler may change the classes order 
and methods order during the recompiling process. Therefore, 
repackaging obfuscation is often used to mutate an existing 
malware to generate a new version with a different signature. 
If an anti-virus system only identifies malware based on a 
cryptographic hash signature, then repackaging obfuscation 
techniques can easily evade the detection. 

DroidAnalytics can detect malware which is generated by 
repackaging obfuscation. Since there is no modification on 



the opcodes within the . dex file, and DroidAnalytics first 
sorts Lev2 and Lev3 signatures before generating the Levi 
signature. Therefore, DroidAnalytics will generate the same 
signature as the original even when one repackages the . apk 
file. 

Experiment. To illustrate the above claim, we carry out 
the following experiment and Figure |4] illustrates the results. 
Opfake is a server-side polymorphism malware. The malware 
mutates automatically when it is downloaded. When analysts 
compare the cyclic redundancy codes (CRCs) of two Opfake 
downloads, it shows that the only meaningful change happens 
in the file data . db which is located in "res/ raw/" folder 
The modified data.db changes the signature data for the 
package in "META-INF" folder By analyzing this form of 
malware, we find that all mutations in Opfake family happen 
in the same opcode (stored in classes . dex). Hence, our 
signature system will generate the same level 1 signature for 
all mutations in this malware family. 

Figure |5] illustrates another example of using DroidAn- 
alytics to analyze the Km in family. We first calculate the 
Levi signatures of all 150,368 applications in our database. 
After the calculation, the result shows that the most fre- 
quent signature (which corresponds to the Levi signature 
90b3d4af 183e9f 72f 818c629b486fdec) comes from 
117 files and all these files have different MD5 values. This 
shows that conventional cryptographic hashing (i.e., MD5) 
cannot identify malware variants but DroidAnalytics can ef- 
fectively identify them. Also, these 117 files are all variants 
of the Km in family. After further analysis, we discover that 
the Km in family is a wallpaper changer application, and all 
its variants have the same application structure and same 
malicious behavior The only difference is that they have 
different icons and wallpaper files. 

B. Analyzing Malware which uses Code Obfuscation: A 

malware writer can use a disassembler (e.g., ApktOOl |14|) 
to convert a .dex file into .small files, then injects new 
malware logic into the .small code, rebuilds it back to 
a . dex file. Based on this rebuilt process, malware writers 
can apply various code obfuscation techniques while preserve 
the behavior as the original one in order to bypass the anti- 
virus detection. As shown in f3l, fTS I, many mobile anti-virus 
products are not effective to detect code obfuscated variants. 

DroidAnalytics will not extract the API calls in methods 
and classes which will not be executed in run time (refer to 
Section because they are defunct and can be generated 
by obfuscators. In addition, the signature generation does 
not depend on the name of methods or classes, hence name 
obfuscation has no effect on our signature. Furthermore, the 
signature generation of DroidAnalytics is based on the analyst- 
defined API calls table. So one can flexibly update the table 
entries to defend against various code obfuscation techniques. 
Experiment. To illustrate the effectiveness of DroidAnalytics 
against code obfuscation, we chose 30 different malware from 
three different families (10 samples in each family) and Table 
iniillustrates our results. The malware families in our study are: 
Basebrid (or Basebridge), Gold-Dream, and Kungfu. We 



Levi Signature Details of Opfalie 



MD5 


Package Name 


Levi Signature 


Detection Info 


e70761dd3d89d26eed3932d54089al21 


com.fywork.xiaohun 


ec6618fc8b7de05d4b036dfd82e9a317 


Trojan-SMS.AndroidOS.Opfake.bo 
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Trojan-SMS.AndroidOS.Opfake.bo 
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Fig. 4. Screen Capture of Opfake Family Levi Signature 



Levi Signature Details of Kmin 


MD5 


Package Name 


Levi Signature 


Detection Info 


ad6efBac95566025d5521d59ad5ae5d6 


com. km. launcher 


90b3d4afl83e9f72f818c629b486fdec 


Backdoor.AndroidOS.Kmin.d 


f8alf5f5cc627e0b0c54cl9ab37e807a 


com. km. launcher 


90b3d4afl83e9f72f818c629b486fdec 


Backdoor.AndroidOS.Kmin.d 


If673b5a301ccc30a854943502797154 


com. km. launcher 


90b3d4afl83e9f72f818c629b486fdec 


Backdoor.AndroidOS.Kmin.d 


5bd29bl4b27fbdd8d36ae803b8418bc7 


com. km. launcher 


90b3d4afl83e9f72f818c629b486fdec 


Backdoor.AndroidOS.Kmin.d 


acc4787d443005f9c5949aa53c8d0234 


com. km. launcher 


90b3d4afl83e9f72f818c629b486fdec 


Backdoor.AndroidOS.Kmin.d 


fl7e520ac79bd9ec5ce7baef83011a2b 


com. km. launcher 


90b3d4afl83e9f72f8 18c629b486fdec 


Backdoor.AndroidOS.Kmin.d 


18fB0f0e9da76c63a87e7ad91fa2cae6 


com. km. launcher 


90b3d4afl83e9f72f8 18c629b486fdec 


Backdoor.AndroidOS.Kmin.d 


5a617380d2b98b5alfld7f64486ebblb 


com. km. launcher 


90b3d4afl83e9f72f818c629b486fdec 


Backdoor.AndroidOS.Kmin.d 


a32d2ed06323ca791711278797a7d608 


com. km. launcher 


90b3d4afl83e9f72f818c629b486fdec 


Backdoor.AndroidOS.Kmin.d 


fl2238101902750604ca900932cca8cc 


com. km. launcher 


90b3d4afl83e9f72f818c629b486fdec 


Backdoor.AndroidOS.Kmin.d 


1750697625b0f59c8946ba351d3al465 


com. km. launcher 


90b3d4afl83e9f72f818c629b486fdec 


Backdoor.AndroidOS.Kmin.d 


594d8403clc3ff7b56352efb9ff2e809 


com. km. launcher 


90b3d4afl83e9f72f818c629b486fdec 


Backdoor.AndroidOS.Kmin.d 









Fig. 5. Screen Capture of Kmin Family Levi Signature 



use ADAM (T6\, a system which can automatically transform 
an original malware sample to different variants by various 
repackaging and obfuscation techniques (e.g. inserting defunct 
methods, modifying methods name, ...etc). We generate seven 
different variants for each malware. Then we put these 240 
new malware samples into the DroidAnalytics system and 
check their signatures. After our signature calculations, the 
result shows that for each malware, the original sample and 
seven mutated variants have distinct MD5 hash values (3 
repackaging, 4 code obfuscation), but all of them have the 
same level 1 signature. This shows that DroidAnalytics' signa- 
ture system is effective in defending against code obfuscation. 

C. Analyzing Malware with Attachement Files or Dynamic 
Payloads: Some malware will dynamically download file 
which contains the malicious code from the Internet. Also, 
some attachment files within a package may contain malicious 
logic but they can be concealed as other valid documents (e.g., 
. png file, . wma file). DroidAnalytics will treat these files as 



dynamic payloads. By using both static and dynamic analysis 
techniques described in Section HI] DroidAnalytics accesses 
these payloads and generates different signatures. 

Experiment. We carried out the following experiment. From 
our malware database, we used our signature system and 
detected some malware contain the same file with a .png 
filename extension. But when we check the magic number of 
this file, it is actually an .elf file. Upon further analysis, 
we found that this file is a root exploit and this malware 
belongs to the GinMaster (or GingerMaster) family. Another 
example is the Plankton family. By using dynamic analysis, 
DroidAnalytics discovered that all malware in this family will 
download a plankton_vO . . 4 . jar (or similar .jar) 
when the main activity of the application starts. Further 
analysis revealed the .jar file contains malicious behavior, 
i.e., stealing browser's history information, making screen 
shortcuts and botnet logic. Table |III] depicts DroidAnalytics 
system detects some representative malware using dynamic 



MD5 


Pakage Name 


Levi Signature 


Malware Family 


Description 


f7967t71b2f32287660ae9a3fa366022 
f2e2d727f95fa868fd7ff54459e766e3 
e01f573cca83fdf2737be6ecee35fe33 
d383ceeb9c6ffcff8cOddl2b73ec43e3 
cd04054 1 693693faca9fec646e 1 2e7e6 
401 952d745cd7ca528 1 a7f08d3e2eede 
27 1 c3965c7822ebf944feb8bbd 1 cfe7f 
8bl2ccdc8a69ct2d6a7e6c()0f698aaaa 


com.tutusw.phonespeedup 
com.tutusw.phonespeedup 
com.tutusw.phonespeedup 
com.tutusw.phonespeedup 
com.tutusw.phonespeedup 
com.tutusw.phonespeedup 
com.tutusw.phonespeedup 
com.tutusw.phonespeedup 


3c83edOf80646c8ball2cb8535c293dd 
3c83edOf80646c8ball2cb8535c293dd 
3c83edOf80646c8ball2cb8535c293dd 
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3c83ed0f80646c8ball2cb8535c293dd 
3c83edOf80646c8ball2cb8535c293dd 
3c83edOf80646c8ball2cb8535c293dd 


Kungfu 
Kungfu 
Kungfu 
Kungfu 
Kungfu 
Kungfu 
Kungfu 
Kungfu 


Oiiginal malware 

Repackaging 

Repackaging 

Repackaging 

Obfuscation 

Obfuscation 

Obfuscation 

Obfuscation 



TABLE II 
Examples of Code Obfuscation 



MD5 


Dynamic Payload 


Description 


Malware Family 


34cb03276e426f 8d6 1 e782b8435d3 1 47 
a24d2ae57c3ceelcf3298c856a9 1 7 1 00 

9e847c9a27dc9898825f466ea00dac81 
dcbe 1 1 e5f3b82ce89 1 b793ea40e4975e 


/assets/ranme.png 
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TABLE III 
Examples of Dynamic Payloads 



pay loads. 

IV. Analytic Capability of DroidAnalytics 

We conduct three experiments and show how analysts can 
study malware, carry out similarity measurement between ap- 
plications, as well as perform class association among 150,368 
mobile applications in the database. 

A. Detailed Analysis on Malware: Using DroidAnalytics, an- 
alysts can also discover which class or method uses suspicious 
API calls via the permission recursion technique. 

• Common Analytics on Malware. First, using the AIS 
parser, DroidAnalytics can reveal basic information of an 
application like the cryptographic hash (i.e., MD5 value), 
package name, broadcast receiver, . . ., etc. This is illustrated 
in Figure |6] In addition, DroidAnalytics has a built-in cloud- 
based APK scanner that supports diverse anti-virus scan results 
(e.g., Kaspersky and Antiy) to help analysts for reference. Our 
cloud-based APK scanner is extensible to accommodate other 
anti-virus scan engines. Last but not least, DroidAnalytics 
can disassemble the . dex file and extract class number, 
method number, and API calls number in each application, 
class or method. These functionalities are useful so analysts 
can quickly zoom in to the meta-information of a suspicious 
malware. Figure |7] shows these functionalities. 

• Permission Recursion. Current state-of-the-art systems 
examine the AndroidManif est . xml to discover permis- 
sions of an application. This is not informative enough since 
analysts do not know which class or which method uses 
these permissions for suspicious activities. In DroidAnalytics, 
we can discover the permission within a class or a method. 
Since each permission is related to some API calls ifTTl . 



In DroidAnalytics, we tag permission to API calls in each 
method. We combine the method permissions within the same 
class as class permission, and combine all class permissions as 
application permission. This helps analysts to quickly discover 
suspicious methods or classes. 

— Experiment. In this experiment, we choose a popular 
malware family Kungfu and examine the permissions at the 
application/class/method levels. Malware in the Kungfu family 
can obtain user's information such as IMEI number, phone 
model,..., etc. It can also exploit the device and gain root 
privilege access. Once the malware obtained the root level 
access, it installs malicious application in the background as 
a back-door service. 

We use DroidAnalytics to generate all three-level signatures. 
Figure |8] shows the partial structure of the signatures with 
permission recursion of two Kungfu malware: A4E3 9D and 
D2EF8A, and together with a legitimate application BEDIE3 
(Levi signature). Firstly, based on the malware reports by our 
cloud APK scanner, we identify A4E3 9D and D2EF8A are 
malware which come from the Kungfu family with differ- 
ent package names, net . atools . android . cmwrap and 
com. atools . netXraf f icStats respectively. By ana- 
lyzing the Lev2 signature, we discover that BCEED is the 
common class which is within the two Kungfu applications. 
Secondly, from the Lev3 signature of BCEED, we use the 
permission recursion method to expose the method 6F100 
with the INSTALL_PACKAGES permission. Note that the 
INSTALL_PACKAGES permission is a system permission 
which the application can install other unrelated applications. 
This shows how DroidAnalytics helps analysts to quickly 
discover the malicious code of methods and classes of the 
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Fig. 6. Screen Capture of Common Analytics on Malware 



Kungfu family. 

• Similarity Measurement. Droid Analytics can also calculate 
the similarity between two Android mobile applications, and 
the computation is based on the three level signatures that 
we discussed. By comparing the similarity of applications, we 
can determine whether an application is repackaged malware. 
Moreover, because we use Lev2 (class) signature as the 
basic building block, our approach not only can provide the 
similarity scores between two applications, but can inform 
analysts the common and different code segments between 
two applications. This great facilitates analysts to carry out 
detailed analysis. 

Our similarity score is based on the Lev2 signatures which 
represent all classes within an application. The similarity score 
is based on the Jaccard similarity coefficient. Given fa and 
fb as two Lev2 signature sets of two applications a and b 
respectively, fa^fb refers to the same Lev2 signatures of a and 
b (or the common classes of these two applications), while fa U 
fb represents the set of classes of these two applications. S{x) 
is a function which returns the total number of API calls in the 
set X. Our similarity score equation between two applications 



is: 



Japp if a 1 fb) 



Sjfa n fb) 
S{fa U fb) ■ 



(1) 



Let us how to use this similarity score to carry out analysis. 
First, given a legitimate application X, we can find all appli- 
cations {Yi,Y2, ...jYk} which are the top p% (say p — 20) 
applications that are similar to X. Second, in the procedure 
of calculating similarity, we can identify the common and 
different Lev2 signatures. As a result, repackaged or mutated 
malware (i.e., Yi), can be easily detected using the similarity 
score and the classes of malicious behavior can be easily 
determined. 

— Experiment. We carry out the experiment using similarity 
measurement based on the Lev2 and Lev3 signatures. We use 
a benign application and calculate the similarity with other 
applications in our database to find all repackaged malware 
of this benign application. Furthermore, we can discover the 
differences between this benign application and repackaged 
malware (at the code level) to see what malicious codes are 
repackaged into legitimate application. 

We choose a legitimate application called "Touch alarm" 
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Fig. 7. Screen Capture of Detailed Lev2 Signature 
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Fig. 8. Illustration of Detail Analytics on Malware 



which is downloaded from Google official market. We use 
DroidAnalytics to compute the similar scores with malware in 
the database. Table |IV] shows the similarity scores between 
"Touch alarm" and other applications in our database (in 
decreasing order of similarity). From the table, we can clearly 
observe that the package names of the top six applications 
are the same: com .k.99k;.k;eel .touch. alert . freeze. 
The first one in the table is the legitimate "Touch alarm" 
application, while the following five applications are repack- 
aged malware of Adrd. Adrd steals the personal information 
like IMEI, hardware information of users, and it will encrypt 
the stolen information and upload to some remote server 
Moreover, it may dynamically download the newest version 



of Adrd and update itself. The table shows that the malware 
writer inserted malicious codes in benign "Touch alarm" and 
repackaged it to different variants of Adrd. 

By using similarity measurement based on the three-level 
signature, DroidAnalytics reveals the difference between two 
applications at the code level. Table |V] shows the comparison 
of legitimate application "Touch alarm" and repackaged 
malware. Highlighted rows are the different level 2 signatures 
of the two applications, while the other rows represent same 
signature of common classes. This shows that DroidAnalytics 
can easily identify different classes of the two applications. 
By using permission recursion which we discussed 
previously, analysts can discover that the different level 
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TABLE IV 

Similarity Scores of "Touch Alarm" and other applications 



2 signature e48040acb2d761fedf a0e9786dd2f 3c2 
has READ_PHONE_STATE, READ_CONTACTS and 
SEND_SMS in repackaged malware. Using DroidAnalytics 
for further analysis, we find suspicious API calls like 
android/ telephony/TelephonyManager ; -> 
getSimSerialNumber, android/telephony /gsm/ 
SmsManager; ->endTextMessage and 
android/ telephony/TelephonyManager; -> 
getDevicelD. Last but not least, we determine the 
malware writer inserted these malicious codes into legitimate 
application and published repackaged malware to various 
third party markets. 

• Class Association. Traditional analysis on malware only 
focuses on one malware but can not associate malware with 
other malware or applications. DroidAnalytics can associate 
legitimate applications and other malware in the class level 
and/or method level. Given a class signature (or Lev2 sig- 
nature), DroidAnalytics keeps track of how many legitimate 
applications or malware using this particular class. Also, with 
the methodology of permission recursion, DroidAnalytics can 
indicate the permission usage of this class signature. By using 
class association, we can easily determine which class or 
method may possess malicious behavior, and which class is 
used for common task, say for pushing advertisement. Lastly, 
for class signatures which are used by many known malware 
but zero legitimate application, these are classes that analysts 
need to pay special attention to because it is very likely that 
they contain malicious code and are used in many repackaged 
or obfuscated malware. 

— Experiment. Using DroidAnalytics, we carry out 
the class association experiment using 1,000 legitimate 
appliactions and 1,000 malware as reported by 
Kaspersky. Figure |9] illustrates the results. After the 
class association, we discover a class (Lev2) signature 
2bcb4f 8940f 00fb7f 50731ee341003df which is 
used by 143 malware and zero legitimate application. 
In addition, the 143 malware are all from the Geinimi 
family. Furthermore, this class has 47 API calls and uses 
READ_CONTACTS and SEND_SMS permissions. Therefore, 
we quickly identify this class contains malicious codes. In 
the signature database, we also find a class (Lev2) signature 
9067f 7292650ba0b5c725165111ef 04e which is used 
by 80 legitimate applications and 42 malware. Further analysis 



shows that this class is used by similar number of legitimate 
applications and malware, and this class uses an advertisement 
library called DOMOB [[l8l. Another class (Lev2) signature 
a007d9e3754daef 90ded300190 9034 51 is used by 105 
legitimate applications and 80 malware. Further examination 
shows that it is a class from the Google official library called 
AdMob lUl. This is illustrated in Figure [Tol 

In summary, all experimented samples are presented in Ta- 
ble |VI] which represents the known malware in our database. 
The detection results is based on the cloud anti-virus en- 
gine using various detection engines (i.e., Kaspersky and 
Antiy(linux version)). Note that in the last column, R (G) 
represents repackaged (generic) malware family. For those 
malware families with less than five samples, we lump them 
as "others" in the table. Antiy (linux version) ||201 is a com- 
mercial anti-virus product that we obtained from the company, 
and the product is known to run the same engine for detecting 
malware in smartphones. The rows are sorted in alphabetical 
order Highlighted rows show the common malware families 
detected by both Kaspersky and Antiy. Others are uniquely 
detected by one anti-virus product. The penultimate row shows 
there are 1,295 common malware samples detected by these 
two anti-virus products. Hence, the number of unique malware 
samples is 2,148. 

V. Zero-day Malware Detection 

Here, we show a novel methodology in using DroidAna- 
lytics to detect the zero-day repackaged malware. We analyze 
three zero-day malware families to illustrate the effectiveness 
of our system. 

A. Zero-day Malware 

Zero-day malware is a new malware that current commercial 
anti-virus systems cannot detect. Anti-virus software usually 
relies on signatures to identify malware. However, signature 
can only be generated when samples are obtained. It is always 
a challenge for anti-virus companies to detect the zero-day 
malware, then update their malware detection engines as 
quickly as possible. 

In this paper, we define an application as a zero-day mal- 
ware if it has malicious behavior and it cannot be detected by 
popular anti-virus software (e.g., Kaspersky, NOD32, Norton) 
using their latest signature database. As of November, 2012, 
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TABLE V 

Comparison Between Legitimate App "Touch Alarm" and Repackaged Malware 
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Fig. 9. Detailed Lev2 Signature of Repackaged Geinimi Malware 



we use Droid Analytics and have successfully detected 342 
zero-day repackaged malware in five different families: AisRs, 
Aseiei, AlProvider, G3app, GSmstracker and YFontmaster 
(please refer to Table I VII I for reference). In this paper, we 
use the name of the injected package (not the name of the 
repackaged applications) as the name of its malware family. 
Furthermore, all samples are scanned by Kaspersky, NOD32, 
Norton and Antiy using their latest database in November, 
2012. We also uploaded these samples to the virustotal [21J 
for malware detection analysis. Note that none of the submitted 
samples was reported as a malware by these engines when we 
caiTied out our experiments. 

In 1(221, ED, authors reported that nearly 86.0% of all 
Android malware are actually repackaged versions of some 
legitimate applications. By camouflaging to some legitimate 
applications, repackaged malware can easily deceive users. 
Given the large percentage of repackaged malware, we explore 
the effectiveness of using DroidAnalytics to detect the zero- 
day repackaged malware. 



B. Zero-Day Malware Detection Methodology 

The process of detecting zero-day repackaged malware can 
be summarized by the following steps. 

Step 1: We first construct a white list for common and 
legitimate classes. For example, we add all legitimate level 
2 signatures, such as those in utility libraries (e.g., Json 
library) or advertisement libraries (e.g., Google Admob library, 
Airpush library) to the white list. All level 2 signatures in the 
white list will not be used to calculate the similarity score 
between two applications. 

Step 2: We calculate the number of common API calls 
between two given applications in the database. This can be 
achieved by using the similarity score in Equation (|2). 

Sapp{faJb) = S{far\h). (2) 

where fa and /b as two level 2 signature sets of two applica- 
tions a and h respectively, S{x) is a function to indicate the 
total number of API calls in the set x. The above similarity 
score between two repackaged malwares focus on the com- 
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Fig. 10. Detailed Lev2 Signature of Advertisement Library 



mon repackaged API calls that correspond to the malicious 
logic, and ignore the effect of other API calls in these two 
applications. 

Step 3: Assume we have N applications in the database, then 
we start with N clusters. The distance between two clusters 
is the similarity score we mentioned in Step 2. We select 
two applications which have the largest similarity score and 
combine them into one cluster For this new cluster, we re- 
calculate the similarity score between this new cluster with 
other N — 2 clusters. The new similarity score is computed by 
averaging all similarity scores between all applications in two 
different clusters. 

Step 4: Again, we combine two clusters which have the largest 
similarity score. We continue this step until the similarity score 
between any two clusters is less than a pre-defined threshold 
T (say T is 100). 

Step 5: After we finish the clustering process, we will use 
anti-virus engines to scan all of these N applications. Each 
application may be classified as legitimate or malicious. 
Step 6: If a cluster has more than n applications (say n = 10) 
and a small fraction / (say / < 0.2) is classified as malicious. 
This should be a suspicious cluster since it is very unlikely 
that more than n applications are similar (in terms of class 
functionality) in the real-world, and most of them are classified 
as benign. Hence, the similarity comes when some of these 
applications are repackaged. We then extract their common 
classes (using our level 2 signature) and examine these classes. 
Once we find any malicious logic in these common classes, 
we discover a zero-day repackaged malware family. 
Experimental results in discovering zero-day repackaged 
malware: Let us present the results of using DroidAnalytics 



to discover three zero-day repackaged malware families. 
- AisRs family: We discover 87 samples of AisRs 
family in our database. All the malware are repack- 
aged from legitimate applications (e.g., com.fengle.jumptiger, 
com.mine.videoplayer) and all of them have a common mali- 
cious package named "com . ais . rs". This malware contains 
a number of botnet commands that can be remotely invoked. 
When the malware runs, it will first communicate with two 
remote servers (see Figure ITTl . These two servers are camou- 
flaged as software download websites(see Figure [12). If any 
of these two servers is online, the malware will receive some 
commands like downloading other . apk files. They are not 
necessary malware, but they contain advertisement from the 
website, ' http://push.aandroid.nef'. It is interesting to note that 
the address is "aandroid.net", not "android.net". Also, one of 
the many botnet commands is to save the user's application 
installation lists and system information to the . SQLite file, 
then upload this file to a remote server 

AlProvider family: We discover 51 samples of 
Al Provider family in our database. All the malware 
are repackaged from legitimate applications (e.g., 
jinho.eye_check, com.otiasj.androradio) and all of 
them have a common malicious package named 
"com . android . internal . provider". There are 
several interesting characteristics of this malware. Firstly, the 
malicious package name is disguised as a system package 
name. Since DroidAnalytics does not detect malware based 
on the package name, so our system can easily discover this 
repackaged malware. Secondly, this malware uses DES to 
encrypt all SMS information (e.g., telephone numbers, SMS 
content) and store them in the DESUtils class. Thirdly, 
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Others (42 families) 


71 


Others (27 families) 


44 


Common 


1295 


Common 


1295 


All 


2003 


All 


1440 



TABLE VI 

Detection Results of our Cloud-based Apk Scanner 



Family Name 


Number 


Malicious Package Name 


AisRs 


87 


com.ais.rs 


Aseiei 


64 


com. aseiei 


AlProvider 


51 


com . android.intemal .pro vider 


G3app 


96 


com.g3app 


GSms tracker 


10 


com.gizmoquip.smstracker 


YFontmaster 


34 


com.yy.fontmaster 


AH 


342 





TABLE VII 

Zero-day Repackaged Malware Samples 
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Fig. 11. Remote Servers of AisRs 



the malware will start a service called OperateService 
in the background when it receives "BOOT_COMPLETED" 
broadcast(see Figure [13] and [T4] |. This service will decrypt 
the SMS information in the DESUtils class and use this 
information to send SMS messages without any notification. 
- G3app family: We discover 96 samples of G3app family 
in our database. All the malware are repackaged from le- 
gitimate applications (e.g., com.opengS.virtua llncomingCall, 
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Fig. 12. Camouflaged Software Download Websites 
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Fig. 13. Disassembled Repackaged Codes of AlProvider 



the repackaged application or the notification in the status bar, 
the malware will download other applications from the remote 
server We believe the hackers want to use repackaged malware 
to publicize their applications and use these advertisements for 
financial gain. 





ml [ 


3 12:15 








2.1MB 


Services 


■ 


OperateService 

started by application. 


04:35 


This service was started by its 
application. Stopping it may cause 
the application to fail. 


stop 1 R t 1 


Processes 


■ 


eTooIs 

com. BinarySheep.eTool 


2.1MB 


Main process that is in use. 





Fig. 14. Malicious OperateService of AlProvider 



com.cs.android.wc3) and all of them have a common malicious 
package named "com.gSapp". There are several malicious 
behaviors in the G3app malware family. Firstly, the malware 
will frequently pop up notification on the status bar and entice 
users to select it (see Figure [TSl l. Secondly, the malware 
will inject trigger codes to every button of the legitimate 
application(see Figure [TSl l. If the user presses any button in 
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Fig. 15. Trigger Notification of G3app 
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Fig. 16. Trigger Button of G3app 



VI. Related Work 

Before the rapid increase of Android malware in 2011, 
researchers focused on the permission and capability leaks 
of Android applications. E.g., David Barrera et al. |24| pro- 
pose a methodology to explore and analyze permission-based 
models in Android. Stowaway lilTl is a tool for detecting 
permission over privilege in Android, while ComDroid [25] 
is a tool which detects communication vulnerabilities. Wood- 
pecker ||26l analyzes each application on a smartphone to 
explore the reachability of a dangerous permission from a 
public, unguarded interface. William Enck et al. ETl propose 
a lightweight mobile phone application with a certification- 
based permission. In this paper, instead of malware detection, 
we focus on designing an analytic system that helps analysts 
to dissect, analyze and correlate Android malware with other 
Android applications in the database. We propose a novel 
signature system to identify malicious code segments and 
associate with other malware in the database. Our signature 
system is robust against obfuscation techniques that hackers 
may use. 

In August 2010, Kaspersky reported the first SMS Trojan, 
known as FakePlayer in Android systems |28|. Since then, 
many malware and their variants have been discovered and 
mobile malware rapidly became a serious threat. Felt et al. 
study 18 Android malware in |29|. Enck et al. |[30 l carry 
out a study with 1,100 android applications but no malware 
was found. Zhou Yang et al. study characterization and 
evolution of Android malware with 1,260 samples. However, 
they did not show how to systematically collect, analyze and 
correlate these samples. 

Yajin Zhou et al. fy2\ were the first to present a sys- 
tematic study for the detection of malicious applications on 
Android Markets. They successfully discover 211 malware. 
Their system, DroidRanger, needs malware samples to extract 
the footprint of each malware family before the known mal- 
ware detection. For zero day malware, DroidRanger serves 
as a filtering system. After the filtering process, suspicious 
malware needs to be manually analyzed. DroidMOSS |33| is 
an Android system to detect repackaged applications using 
fuzzy hashing. As stated in |33|, the system is not designed for 
general malware detection. Furthermore, the similarity score 
provided by DoridMoss is not helpful in malware analysis. 
In addition, obfuscation techniques can change the order of 
classes and methods execution, and this will introduce large 
deviation in the measure used in DroidMOSS. Michael Grace 
et al. develop RiskRanker [34] to analyze whether a particular 
application exhibits dangerous behavior It uses class path 
as the malware family feature to detect more mutations. 
However, obfuscation can easily rearrange the opcode along 
an execution path. So using class path for malware feature is 
not effective under obfuscation attack. Our system overcomes 
these problems by using a novel signature algorithm to extract 
the malware feature at the opcode level so it captures the 
semantic meaning for signature generation. 

For PC based malware, a lot of research work focus on 



the signature based malware detection. For example, authors 
in [TSl discussed the limitation of using signature to detect 
malware. In [35 1, authors described the obfuscation techniques 
to hide malware. Authors in ll36l presented an automatic 
system to mine specifications of malicious behavior in mal- 
ware families. Paolo Milani Comparetti et al. [37.1 proposed 
a solution to determine malicious functionalities of malware. 
However, mobile malware has different features as compared 
with PC based malware. It is difficult to transform a PC based 
malware detection solution for mobile devices. For example, 
[3 1 reported that many anti-virus products have poor perfor- 
mance in detecting Android malware mutations, although these 
products performed reasonably well for PC based malware. 
Repackaging is another characteristic of android malware. Au- 
thors in 1 33] showed there are many repackaged applications 
in Android third party markets and significant number of these 
applications is malware. Based on the above studies, it is clear 
that a more sophisticated methodology to detect and analyze 
Android malware is needed. 

VII. Conclusion 

We present DroidAnalytics, an Android malware analytic 
system which can automatically collect malware, generate 
signatures for applications, identify malicious code segment 
(even at the opcode level), and at the same time, associate the 
malware under study with various malware and applications in 
the database. Our signature methodology provides significant 
advantages over traditional cryptographic hash like MD5- 
based signature. We show how to use DroidAnalytics to 
quickly retrieve, associate and reveal malicious logics. Using 
the permission recursion technique and class association, we 
show how to retrieve the permissions of methods, classes 
and application (rather than basic package information), and 
associate all applications in the opcode level. Using DroidAna- 
lytics, one can easily discover repackaged applications via the 
similarity score. Last but not least, we have used DroidAnalyt- 
ics to detect 2,494 malware samples from 102 families, with 
342 zero-day malware samples from six different families. 
We have conducted extensive experiments to demonstrate the 
analytic and malware detection capabilities of DroidAnalytics. 
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